Structural organization and sequence diversity of the complete nucleotide sequence encoding the Plasmodium malariae merozoite surface protein-1

The merozoite surface protein-1 (MSP1) is a prime candidate for an asexual blood stage vaccine against malaria. However, polymorphism in this antigen could compromise the vaccine’s efficacy. Although the extent of sequence variation in MSP1 has been analyzed from various Plasmodium species, little is known about structural organization and diversity of this locus in Plasmodium malariae (PmMSP1). Herein, we have shown that PmMSP1 contained five conserved and four variable blocks based on analysis of the complete coding sequences. Variable blocks were characterized by short insertion and deletion variants (block II), polymorphic nonrepeat sequences (block IV), complex repeat structure with size variation (block VI) and degenerate octapeptide repeats (block VIII). Like other malarial MSP1s, evidences of intragenic recombination have been found in PmMSP1. The rate of nonsynonymous nucleotide substitutions significantly exceeded that of synonymous nucleotide substitutions in block IV, suggesting positive selection in this region. Codon-based analysis of deviation from neutrality has identified a codon under purifying selection located in close proximity to the homologous region of the 38 kDa/42 kDa cleavage site of P. falciparum MSP1. A number of predicted linear B-cell epitopes were identified across both conserved and variable blocks of the protein. However, polymorphism in repeat-containing blocks resulted in alteration of the predicted linear B-cell epitope scores across variants. Although a number of predicted HLA-class II-binding peptides were identified in PmMSP1, all variants of block IV seemed not to be recognized by common HLA-class II alleles among Thai population, suggesting that diversity in this positive selection region could probably affect host immune recognition. The data on structural diversity in PmMSP1 could be useful for further studies such as vaccine development and strain characterization of this neglected malaria parasite.

period and can recrudesce after many years of dormancy [9][10][11] . Like other human malaria parasites, P. malariae has been incriminated in transfusion-transmitted malaria in which the prevalence seems to vary across endemic areas 12,13 . Meanwhile, the low parasite density of P. malariae among infected individuals has hampered efficient detection by conventional microscopy, especially when it co-infects with other malaria species [14][15][16] . On the basis of microscopy diagnosis, P. malariae infection accounted for approximately 0.1% of all malaria cases in Thailand 17 whereas PCR could diagnose about five times higher than microscopic examination 15,[18][19][20] . To achieve malaria control and elimination, it may require effective interventions against the low prevalent Plasmodium species including P. malariae.
One of the leading vaccine candidates against asexual blood stages of malaria parasites is merozoite surface protein-1 (MSP1) which is believed to play a crucial role in invasion of host erythrocytes by the merozoites and during their egression from infected cells after asexual reproductive maturation 21,22 . The MSP1 of P. falciparum (PfMSP1) is synthesized as a precursor protein during schizogony and subsequently processed into 4 polypeptides of 83, 30, 38 and 42 kDa. Prior to erythrocyte entry of the merozoites, secondary processing of the C-terminal 42-kDa fragment ensues, yielding 33-and 19-kDa protein fragments 23 . On the basis of amino acid sequence identity, PfMSP1 have been divided in to 17 blocks, containing five conserved, five semi-conserved and seven variable blocks 24 . The 19-kDa fragment containing two epidermal growth factor (EGF)-like domains has been considered to be a potential vaccine candidate because it is a target for invasion inhibitory antibodies while naturally acquired antibodies against this fragment have been associated with protection against symptomatic malaria among individuals living in malaria endemic areas 25,26 . Likewise, the tripeptide repeats in PfMSP1 could elicit protective antibodies among children in Sub-Saharan Africa 27 . Furthermore, erythrocyte-binding domains have been identified in the 83-, 38-and 33-kDa fragments of PfMSP1 [28][29][30][31] . Meanwhile, the MSP1 gene of P. vivax (PvMSP1) displays mosaic organization of variable blocks whereas those of P. ovale spp. (PoMSP1) and P. knowlesi (PkMSP1) exhibit structural variation that are different from that of PfMSP1 [32][33][34] .
To date, mainly partial sequences of the MSP1 gene of P. malariae (PmMSP1) have been determined using isolates from French Guiana, Cameroon, Brazil and Thai-Myanmar border which reveals conserved and variable blocks [35][36][37][38][39] . However, the organization of these blocks based on the complete coding sequences remains to be elucidated. Herein, we analyzed the complete coding sequence of PmMSP1 among clinical isolates from diverse endemic areas of Thailand. Results have shown that PmMSP1 contained four variable blocks flanked by five conserved blocks. Like other human malarial MSP1s, intragenic recombination and natural selection have influenced diversity at this locus 24,[32][33][34] . Furthermore, analysis of predicted linear B-cell and helper T-cell epitopes has suggested that polymorphism in this protein could affect host immune recognition.

Results
Amplification and sequencing of PmMSP1. The  www.nature.com/scientificreports/ parum (n = 2), P. vivax (n = 17) and both P. falciparum and P. vivax (n = 1). However, the PCR primers used in this study were specific for amplification of PmMSP1 because direct sequencing of the PCR-amplified products yielded clear electropherogram without superimposed signals of the sequences of this locus. Therefore, no cross amplification of PfMSP1 and PvMSP1 was observed in isolates containing P. falciparum or P. vivax. The complete coding sequences of PmMSP1 in this study varied from 5088 to 5493 bp. In total, 20 alleles were identified in which alleles III, VI, X, XIII and XV contained more than one isolate (Fig. 1). Interestingly, the same alleles could be found from different sampling periods and from diverse endemic areas of the country.

Structural organization of PmMSP1.
To determine the structural organization of PmMSP1, nucleotide diversity was determined across the aligned complete coding sequences of 35 Thai isolates and the sequence from a Cameroonian patient (GenBank accession no. FJ824669) whose nucleotide and amino acid positions of the gene/protein were used as reference. Results revealed that the extent of nucleotide diversity was variable across PmMSP1 with two regions containing nucleotide diversity > 0.04 ( Fig. 2A), a comparable level for variable blocks of PvMSP1 32 . One was from codons 210 to 241, designated block IV, and the other was in block VI spanning amino acids 682 and 832. Short insertions and deletions (indels) were found between codons 57 and 69 at the N-terminal part of PmMSP1, designated block II, (Fig. 3). Meanwhile, Tandem Repeats Finder algorithm has identified two repeat-containing regions in PmMSP1, one corresponding to block VI and the other from codons 989 to 1032 (block VIII). Therefore, the remaining nonrepeat regions encompassing approximately 86% of the entire coding region in PmMSP1 with nucleotide diversity < 0.02 were assigned to conserved blocks, consisting of blocks I, III, V, VII and IX (Fig. 2B).

Diversity of indels in PmMSP1.
Previous reports have shown that P. malariae and P. brasilianum possessed similar or almost indistinguishable genetic background 38,40,41 . To gain further insight into sequence diversity in PmMSP1, the previously reported partial sequences of PmMSP1 and the MSP1 sequences of P. brasilianum (PbrMSP1) were included for comparison between the corresponding regions 38,39 . Despite short indels in block II, nine variants were identified. Of these, six variants occurred in Thai isolates whereas five variants were found in PbrMSP1 in which alleles V and VI were shared between PmMSP1 and PbrMSP1 (Table 1).    (Table 2). Of these, 10 haplotypes were identified among Thai isolates in which three haplotypes were shared across endemic countries. All haplotypes of available PbrMSP1 sequences (n = 4) were distinct from those of PmMSP1. However, allele VI from four Brazilian isolates (GenBank accession nos. KR072269, KR072272, KR072278 and KR072279) and allele XV from a Peruvian Saimiri monkey (KR072284) were closely related with a single amino acid difference (I233K) 38 . It is noteworthy that seven of nine amino acid substitutions in block IV of PbrMSP1 were shared with those of PmMSP1 ( Table 2).

Diversity of repeats in PmMSP1.
Block VI contained complex repeat motifs with multiple patterns of repeat arrays and arrangements. Together with previously reported sequences of PmMSP1 (n = 16) and PbrMSP1 (n = 5), 35 haplotypes have been identified in this block in which 19 haplotypes occurred among Thai isolates (Fig. 3). The N-terminal part of this block contains non-repetitive amino acid sequences with variable indels, resulting in eight to 40 residues in this region. On the basis of distinct repeats and arrangements, block VI could be classified into types A and B. Type A contained 17 alleles (A1.1-A1.17) whereas type B could be further subdivided into subtypes B1 and B2, containing eight and 10 alleles, respectively (Fig. 3). Interestingly, the amino acid sequence of allele A1.10 was shared between P. malariae from a Brazilian patient (KR072216) and P. brasilianum from a Peruvian Saimiri monkey (JX045641) whereas the remaining PbrMSP1 and most other PmMSP1 type A alleles seemed to be closely related. It is noteworthy that none of PbrMSP1 sequences belonged to type B. Meanwhile, the other repeat-containing region was located in block VIII spanning codons 989 and 1032 (residues after FJ824669), characterized by a degenerate octapeptide repeat motif, P(A)Q(T)P(S, T or Q)QA(S)A(S or T)L(S or V)P(V or -), with variation in the number of repeat units among isolates. Of 35 Thai isolates and 14 previously reported sequences, 13 haplotypes were identified in this block in which haplotype I was most common and occurred in isolates from Thailand, Myanmar and Brazil, followed by haplotype XIII which was shared between PmMSP1 and PbrMSP1 35,38,39 (Table 3).
Microheterogeneity in conserved blocks. The complete sequences of all 5 conserved blocks have been available from 35 Thai isolates and an isolate from Cameroon (FJ824669). All nucleotide substitutions in conserved blocks were dimorphic, i.e. either one or the other of any two bases occurred at given positions. In total, 37 mutations were observed in conserved blocks, resulting in three haplotypes in blocks I and III, nine in block V, four in block VII and 13 in block IX ( Table 4). The levels of nucleotide diversity in conserved regions ranged from 0.00098 to 0.00232 in blocks I and VII, respectively, which was an order or two orders of magnitude less than those in variable blocks (blocks IV, VI and VIII). Since the 19-kDa fragment of PfMSP1 has been con- www.nature.com/scientificreports/ sidered to be an asexual blood stage vaccine target 26 , microheterogeneity in this region is of concern for vaccine development. Analysis of the homologous region to the 19-kDa-fragment-encoding sequence in PmMSP1 has revealed 4 nucleotide substitutions: c.5045G>A (G1681E), c.5055A>C (E1684D), c.5060A>T (E1686V) and c.5074C>A (Q1691K) (positions after the FJ824669 sequence). In total, four haplotypes occurred in the puta- in which haplotype I was found in the Cameroonian isolate (FJ824669) whereas the remaining haplotypes coexisted among P. malariae populations in Thailand. Of these four substituted residues, c.5044G>C+5045G>A (G1681Q) was found in three isolates from Brazil and constituted another haplotype characterized by Q-E-E-K although singletons were previously observed in other five positions of this region 38 Brazil (8) Fig. S1).

Recombination.
Evidence of intragenic recombination in the PmMSP1 gene was determined from 35 Thai isolates by using the RDP4 package which revealed 21 potential recombination sites across the coding region of this gene (Table 5). Recombination breakpoints were detected more commonly in repeats or variable blocks (28 of 42 sites, 66.7%) than in conserved blocks. On the other hand, no recombination event was detected in conserved blocks I, II and V. Recombination breakpoints spanned 41-3978 bp with an average length of 784 bp.
Phylogenetic analysis. Analysis of the complete coding sequences of PmMSP1 has revealed two distinct clades in the phylogenetic tree (Fig. 4). The maximum likelihood tree inferred from the sequences of block VI per se has revealed 2 clades corresponding to characteristic repeats assigned to types A and B. It is noteworthy that the bifurcating clusters of taxa in the clade belonging to type B were in line with the isolates bearing types B1 and B2 repeats (Figs. 3, 4). On the other hand, the tree inferred from the sequences excluding block VI showed a different topology.
Predicted linear B-cell epitopes. The graphical presentation from BepiPred 2.0 analysis has revealed a number of potential linear B-cell epitopes across PmMSP1, spanning both conserved and variable blocks (Fig. 5A). Short indels in block II did not affect predicted B-cell epitopes encompassing this region. Interestingly, amino acid substitutions in variable block IV seemed not to affect predicted linear B-cell epitopes in all variants (Fig. 5B). On the other hand, the predicted epitope scores were variable among different alleles of blocks VI and VIII (Fig. 5C,D). Variation in the predicted scores was more pronounced among variants in block VI in which some regions were below the cutoff threshold value for being linear B-cell epitopes.  Fig. S2). Block IV did not receive adequate scores for being HLA-class II-binding peptides (percentile rank < 10 and MHC binding affinity IC 50 < 1000 nM) for these common HLA class II alleles 42,43 . However, searching for potential HLA-class II-binding peptides among alleles spanning block IV from residues 207 to 221 has shown that alleles II, IV and V had percentile ranks less than 10 and MHC binding affinity IC 50 < 1000 nM for some uncommon HLA class II alleles in Thailand 44 . On the other hand, a potential www.nature.com/scientificreports/ HLA-class II-binding peptide was identified in one of nine alleles (allele V) of block IV encompassing residues 211-225 (Table 6). Taken together, these peptide variants could be potential helper T-cell epitopes in this molecule albeit being recognized by some uncommon HLA class II alleles among Thai population 44 .

Discussion
In this study, we have shown that the complete coding sequence of PmMSP1could be partitioned into five conserved and four variable blocks. Like other malarial MSP1s, conserved blocks of PmMSP1 exhibited microheterogeneity of sequences with dimorphic nucleotide substitutions 24,[32][33][34]45,46 . Comparative analysis has revealed that short indels in block II of PmMSP1 seemed to be homologous to a short indel region at the 5′ portion of PvMSP1. Likewise, variable nonrepeat block IV of PmMSP1 were found to be homologous to variable nonrepeat blocks of PoMSP1, and repeat domains of PkMSP1 and PvMSP1 (Supplemental Fig. 3). Likewise, repeat blocks    Fig. 3). Meanwhile, the distantly related PfMSP1 also contained repeats in blocks VIII homologous to block VI of PmMSP1. Although variable and semi-conserved blocks of PfMSP1 consisted of two distinct parental alleles (MAD20 and K1), sequences of these regions were highly conserved within each allelic family 24,45 . Therefore, intraspecific conserved blocks of these malarial MSP1 genes seemed to be largely found in corresponding locations. Taken together, the similarity in primary structural organization of MSP1s across Plasmodium species may suggest that this locus has evolved from a common ancestral sequence whereas the lack of homologous regions in some domains of the genes among species could imply post-speciation evolution of individual MSP1 lineages. Consistently, it has been suggested that positive selection could influence lineage-specific evolutionary history of some human and simian malarial MSP1s 47 . It is noteworthy that the levels of nucleotide diversity of PkMSP1, PvMSP1 and PfMSP1 were comparable among Thai isolates. On the other hand, the level of nucleotide diversity of PmMSP1 was significantly less than those of PkMSP1, PvMSP1 and PfMSP1 but remarkably greater than those of PoMSP1 19,32,33,[48][49][50] . Consistent findings were observed when analysis was performed separately for synonymous (π S ) and nonsynonymous sites (π N ) (Supplemental Table 2). The extent of nucleotide diversity among MSP1s of different Plasmodium species in Thailand could be due to evolutionary and population genetic forces on parasite populations such as mutation, recombination and population processes. Meanwhile, the neutral theory of molecular evolution predicts that the level of nucleotide diversity is proportional to the mutation rate (μ) and effective population size (N e ) under mutation-drift equilibrium 51 . Since the mutation rates of malarial MSP1 genes seemed to be similar across species [52][53][54] , variation in the levels of nucleotide diversity of these loci could be due to the difference in effective population sizes among Plasmodium species in Thailand. Although some regions or residues in malarial MSP1s were deviated from selective neutrality, the remaining majority of sequences seemed to be under neutral evolution. Therefore, the level of nucleotide diversity may roughly reflect the number of breeding individuals in the population. On the other hand, the low level of nucleotide diversity in PmMSP1 could represent the low transmission rate and probably from bottleneck effects due to malaria control measures as previously noted 38,46 . Our previous surveys of malaria in Thailand have shown that the prevalence of P. malariae and P. knowlesi in Thailand was comparable 15,[18][19][20] . Therefore, it is likely that the higher level of nucleotide diversity of PkMSP1 than that of PmMSP1 could stem from a hidden large reservoir of P. knowlesi in its macaque natural hosts in this country 33,55 . On the one hand, the haplotype diversity of most malarial MSP1 genes in Thailand was relatively high (> 0.9), implying that distinct or rare haplotypes were abundant in the populations (Supplemental Table S2). On the other hand, we observed some predominant PmMSP1 haplotypes in this country, i.e. haplotypes III, XIII and X, which occurred across endemic provinces and between long time intervals of sample collections (Fig. 1), suggesting that the parasites bearing these haplotypes could probably have reproductive advantage.
Conserved blocks in PmMSP1 displayed microheterogeneity of sequences in which nucleotide substitutions seems to have evolved neutrally because block-wise analysis revealed that d S was not significantly different from d N (Table 4). However, codon-based analysis has identified four positively selected codons in conserved blocks, www.nature.com/scientificreports/ suggesting that natural selection has influenced evolution of particular codons. Interestingly, one of these codons (residue E1684D) was located between the two EGF-like domains at the C-terminal part of PmMSP1 in which the homologous region in PfMSP1 has been a target for naturally acquired antibodies associated with clinical protection against falciparum malaria 26 . Intriguingly, positive selection in the EGF-like domain of PmMSP1 could probably be driven by host immune pressure. On the other hand, evidence for purifying selection was detected at codon 1374 (GAT ⟶ GAC) that was located in close proximity to the canonical 38 kDa/42 kDa cleavage site in PfMSP1 56 . Importantly, cleavage at this site has been shown to be a rate-limiting processing step, suggesting its pivotal role for MSP1 proteolytic maturation 57,58 . Therefore, deviation from selective neutrality occurred at particular residues in conserved regions of PmMSP1.
The variable nonrepeat block IV of PmMSP1 spanned 32 codons with 21 amino acid substitutions, resulting in 16 alleles among Thai and global isolates ( Table 2). The significant difference in d N exceeding d S in this block implies that positive selection could influence diversity in this region (Table 4). On the basis of amino acid alignment, block IV of PmMSP1 was homologous to block III of PfMSP1, a portion of the 83-kDa fragment which forms a flexible wing domain of the protein as demonstrated by single-particle cryo-electron microscopy 31 . Several lines of evidence have suggested that MSP1 could be detected as monomeric and dimeric forms [58][59][60] . It has been shown that dimerization of PfMSP1 involves the interaction between the 83-kDa and 42-kDa fragments 31 . Although the significance of dimerization of PfMSP1 remains unknown, the protective capability against falciparum malaria conferred by natural antibodies to the 83-kDa fragment could suggest the functional importance of this region 61 . Importantly, in silico analysis has shown that block IV of PmMSP1 contained both B-cell and helper T-cell epitopes. Consistently, recombinant proteins derived from various regions of PmMSP1 including the N-terminal fragment elicited strong immunogenicity in mice 62 and were highly recognized in serum samples of primates and non-human primates from malaria endemic areas 63,64 . Although allelic variation in block IV of this protein seemed not to drastically change the propensity of being B-cell epitopes as predicted by the IEDB analysis resource (Fig. 5B), amino acid substitutions in this region were unlikely recognized by common HLA class II alleles among Thai population (Fig. 4B, Table 6, Supplemental Fig. S2). Importantly, mutations in block IV may reduce or totally abolish predicted binding capability of the peptides to some uncommon HLA class II alleles in Thai population (Table 6; Supplemental Fig. S2). Undoubtedly, further studies are required to address the immunological significance of helper T-cell epitopes in block IV of PmMSP1. Therefore, it seemed that positive selection in block IV could probably be driven by host immune pressure.
Repetitive amino acid sequences have been observed in several malarial antigens including MSP1s. Our analysis has revealed two repeat-containing regions in blocks VI and VIII of PmMSP1. Unlike block VIII that contained degenerate octapeptide motifs, repeats in block VI were more complex, characterized by a repertoire of different repeat arrays and arrangements. Meanwhile, the RDP4 package has identified 21 recombination breakpoints in PmMSP1. Interestingly, about half of recombination events involved block VI whereas about one-third of the breakpoints occurred within this block. Besides slip-strand mispairing mechanism that could generate sequence and size variation in repeat sequences, recombination may contribute to shuffle of repeat units in block VI. Although a number of linear B-cell epitopes were predicted in this block (Fig. 5A), in silico analysis has suggested that variation in repeat sequences could affect antibody recognition (Fig. 5C). Meanwhile, phylogenetic tree inferred from the block VI sequences of PmMSP1 has revealed two distinct clades, corresponding to repeat sequence types A and B of this block. Importantly, variation in the number of repeat units could affect intensity of antibody reactivity whereas distinct variants of repetitive antigens may abolish specific antibody response as shown by antibody recognition of repeat antigens in block II of PfMSP1 65,66 . Therefore, sequence divergence of repetitive regions in PmMSP1 could probably enhance host immune evasion by the parasites.
One of the shared features of PmMSP1 and PvMSP1 was the presence of indels near the N-terminus of the proteins. Although indels in block II spanned 17 codons, 6 alleles have been identified among Thai isolates (Table 1). Indels are commonly found in both coding and noncoding regions of prokaryotes and eukaryotes genomes while they may occur within repeats and nonrepeat regions 67,68 . The generation of indels related with repeats could be due to polymerase slippage 69,70 . On the other hand, the formation of indels in nonrepeat regions required pre-existing palindromic or quasi-palindromic sequences, provoking a double-stranded break intermediate during DNA replication while the ensuing repair process was imperfect [71][72][73][74] . It is noteworthy that quasipalindromic repeats were identified around indels of PmMSP1 and PvMSP1, supporting the mechanisms for indel formation in nonrepeats of these genes (Supplemental Fig. S4). Although analysis of natural selection on these indels was not possible due to unknown ancestral state of this region, the lack of frame-shift mutation following indels in both PmMSP1 and PvMSP1 could imply selective constraint on the protein structure and/or function.
Several lines of evidence have suggested that P. malariae and P. brasilianum were de facto either con-species or the same parasites 38,40,41 . A repertoire of alleles in block VI constituting the most polymorphic region of the gene has been identified among PmMSP1 and PbrMSP1 ( Fig. 2; Table 4). Importantly, allele A1.10 of block VI was shared between the MSP1 genes of P. malariae and P. brasilianum whereas allele XIII of block VIII has been previously reported to occur in both species 38 . Like other genes or non-coding loci containing repeats in malarial genomes, variation in repeat sequences and the number of repeat units could be generated by the process of slip-strand mispairing mechanism 75,76 . Therefore, it is unlikely that identical complex repeats could have arisen from homoplasy. Furthermore, shared alleles between PmMSP1 and PbrMSP1 have been observed in variable blocks II (alleles V and VI) and VIII (allele XIII) whereas a single codon difference was observed between alleles VI and XV in variable block IV (alleles VI and XV) (Tables 1, 2, 3). Taken together, it is likely that P. malariae and P. brasilianum could be identical species or at least con-species as previously noted 38,41 .
In conclusion, analysis of the complete coding sequences of PmMSP1 from clinical isolates has revealed structural organization of this locus. Besides structural similarity across human malarial MSP1s, evidences of intragenic recombination and natural selection have been identified in PmMSP1. The information from this

Materials and methods
Parasite isolates. Thirty-five Plasmodium malariae isolates were obtained from symptomatic malaria patients during surveys of Plasmodium species distribution in Thailand during 1994 and 2016 (Fig. 1). Either finger-pricked or venous blood samples were taken from each subject and spotted onto filter papers or preserved in EDTA, respectively. Both thin and thick blood films were prepared from fresh blood and stained with Giemsa solution for microscopic examination of malaria parasites. DNA was extracted from each blood sample using Qiagen DNA mini kit (Qiagen, Hilden, Germany) per the manufacturer's recommendation and kept at − 40 °C until use. Definite species identification was performed by species-specific nested PCR targeting 18S rRNA, mitochondrial cytochrome b or cytochrome oxidase I as previously described 15,19,20 .
PCR amplification and sequencing of the PmMSP1 gene. The complete coding sequence of PmMSP1 was amplified by nested PCR using outer primers: Pmmsp1F0 (5′-TAC TCT ATA TTA TCA AGT TTA ATT C-3′) and Pmmsp1R0 (5′-CAT TCG TAT CCT TCT TTT CTGT-3′), and inner primers: Pmmsp1F01 (5′-GTT TAA TTC AAA AAT GAA AGCAC-3′) and Pmmsp1R01 (5′-TCT TTT TTT CTT AAA GTA AGT TAA AC-3′). Amplification reaction and condition were as previously described 34 . All amplification reactions were done in an Applied Biosystem GeneAmpH PCR System 9700 thermocycler (PE Biosystems, Foster City, CA). PCR products were analyzed by 1% agarose gel electrophoresis. The PCR products were purified by using a QIAamp PCR purification kit (Qiagen) and used as templates for sequencing. Sequencing primers were deployed to obtain overlapping sequences of the gene in which both directions were determined directly from the PCR-purified templates (Supplemental Table S1). Validation of singletons and indels in the sequences was performed by sequencing of the PCR products from independent amplification reactions using the same genomic DNA as templates.
Data analysis. Alignment of the PmMSP1 nucleotide sequences was performed by using the default option of the MUSCLE program and manually edited 77 . Indels in coding regions were determined from multiple alignments of amino acid sequences to maintain the reading frame. The sequence of the first complete PmMSP1 gene from a Cameroonian patient was used as reference (GenBank accession number FJ824669) 36 . Tandem repeats were analyzed by scanning each sequence using window sizes per the default option of the Tandem Repeats Finder version 4.0 algorithm 78 . Nucleotide diversity was computed from the average number of nucleotide differences per site between two sequences in the sample and the standard errors were estimated by 1000 boostrap pseudoreplicates 79 . A sliding window analysis of nucleotide diversity was performed by using window length of 100 nucleotides and step size of 15 sites. Haplotype diversity and its sampling variance were determined by using the DnaSP program 80 . The number of synonymous substitutions per synonymous site and the number of nonsynonymous substitutions per nonsynonymous site was computed using Nei and Gojobori's method 79 with Juke and Cantor correction 81 . Standard errors of these parameters were estimated by the bootstrap method with 1000 pseudoreplicates using the MEGA 6.0 program 82 . Differences between the nucleotide diversity values were determined by a two-tailed Z-test. Deviation from selective neutrality at individual codons was identified using fast unconstrained Bayesian approximation (FUBAR) method implemented in the Datamonkey Web-Server 83,84 .
To minimize the interfering signals from recombination on selection of individual codons, the data generated by elimination of recombination segments was deployed for analysis 85 . Determination of intragenic recombination was performed by using the Recombination Detection Program version 4 (RDP4) 86 . Phylogenetic trees were constructed by using the Maximum Likelihood method based on the General Time Reversible model with a discrete Gamma distribution to model evolutionary rate differences among sites 84 . The final tree is the one with the highest log likelihood value. Bootstrap supports for the branching patterns were estimated from 1000 pseudoreplicates of the sample data. Prediction of linear B-cell epitopes was done by using the BepiPred linear epitope prediction 2.0 implemented in the Immune Epitope Database (IEDB) And Analysis Resource 87 . The threshold for linear B-cell epitopes was more than or equal to the average predicted residue score of the protein. The HLAclass II-binding peptides were predicted by using the IEDB recommended 2.22 algorithm with a default 12-18 residues option 88 . The criterion for being HLA-class II-binding peptides included the percentile rank ≤ 10 and the IC 50 threshold for MHC binding affinity ≤ 1000 nM 43 . The common HLA class II haplotypes among Thai population were based on allele frequency ≥ 0.1 according to the previous report 44 .
Ethical approval. The study protocol was approved by the Institutional Review Board on Human Research of Faculty of Medicine, Chulalongkorn University (IRB No. 384/60 and COA No. 805/2018). Written informed consent was obtained from participants or from parents or guardians prior to blood sample collections. All procedures were performed in accordance to the relevant guidelines and regulations.

Data availability
Thirty-five complete coding sequences of PmMSP1 have been deposited in NCBI GenBank under accession numbers OM525734-OM525768. The datasets generated during and/or analyses during the current study are available from the corresponding author upon request.